--- title: "Using a delay-adjusted case fatality ratio to estimate under-reporting" description: "Using a corrected case fatality ratio, we calculate estimates of the level of under-reporting for any country with greater than ten deaths" status: real-time-report rmarkdown_html_fragment: true update: 2020-05-09 authors: - id: tim_russell corresponding: true - id: joel_hellewell equal: 1 - id: sam_abbott equal: 1 - id: nick_golding - id: hamish_gibbs - id: chris_jarvis - id: kevin_vanzandvoort - id: ncov-group - id: stefan_flasche - id: roz_eggo - id: john_edmunds - id: adam_kucharski redirect_from: - /topics/covid19/severity/2020-03-22-global_cfr_estimates.html tags: [severity] ---
To estimate the percentage of symptomatic COVID-19 cases reported in different countries using case fatality ratio estimates based on data from the ECDC, correcting for delays between confirmation-and-death.
In real-time, dividing deaths-to-date by cases-to-date leads to a biased estimate of the case fatality ratio (CFR), because this calculation does not account for delays from confirmation of a case to death, and under-reporting of cases.
Using the distribution of the delay from hospitalisation-to-death for cases that are fatal, we can estimate how many cases so far are expected to have known outcomes (i.e. death or recovery), and hence adjust the naive estimates of CFR to account for these delays.
The adjusted CFR does not account for under-reporting. However, the best available estimates of CFR (adjusting or controlling for under-reporting) are in the 1% - 1.5% range [1–4]. Large studies in China and South Korea estimating the CFR at 1.38% (95% CrI: 1.23–1.53%)[2] and 1.4% (95% CrI: 1.2-1.7%)[4] respectively. Based on these studies, and for simplicity, we assume a baseline CFR of 1.4% for our analysis.
If a country has an adjusted CFR that is higher (e.g. 20%), it suggests that only a fraction of cases have been reported (in this case, 1.4/20=7.0% cases reported approximately).
Figure 1: Temporal variation in reporting rate. We calculate the percentage of symptomatic cases reported on each day a country has had more than ten deaths. We then fit a Generalised Additive Model to these data (see Temporal variation model fitting section for details), highlighting the temporal trend of each countries reporting rate. The shaded region is the 95% CI of fitted GAM.
Figure 2: Plotting the estimates for the proportion of symptomatic cases reported in different countries using cCFR estimates. Blue shading is the 2.5% - 97.5% confidence range.
| Country | Percentage of symptomatic cases reported (95% CI) | Total cases | Total deaths |
|---|---|---|---|
| Afghanistan | 18% (12% - 44%) | 1949 | 60 |
| Albania | 21% (13% - 50%) | 766 | 31 |
| Algeria | 6.3% (5% - 12%) | 3848 | 444 |
| Andorra | 18% (12% - 39%) | 753 | 42 |
| Argentina | 15% (11% - 29%) | 4272 | 214 |
| Armenia | 50% (30% - 100%) | 1932 | 30 |
| Australia | 84% (59% - 100%) | 6746 | 90 |
| Austria | 29% (23% - 46%) | 15364 | 580 |
| Azerbaijan | 63% (36% - 100%) | 1766 | 23 |
| Bahamas | 6% (3.2% - 20%) | 80 | 11 |
| Bangladesh | 18% (13% - 45%) | 7103 | 163 |
| Belarus | 73% (50% - 100%) | 13181 | 84 |
| Belgium | 5.4% (4.6% - 9.3%) | 47859 | 7501 |
| Bolivia | 11% (7.2% - 26%) | 1110 | 59 |
| Bosnia and Herzegovina | 22% (15% - 47%) | 1588 | 62 |
| Brazil | 7.6% (6.4% - 15%) | 78162 | 5466 |
| Bulgaria | 16% (11% - 36%) | 1447 | 64 |
| Burkina Faso | 14% (9.3% - 31%) | 641 | 43 |
| Cameroon | 20% (13% - 45%) | 1832 | 61 |
| Canada | 12% (10% - 23%) | 51587 | 2996 |
| Chile | 50% (37% - 100%) | 14885 | 216 |
| China | 24% (20% - 32%) | 83944 | 4637 |
| Colombia | 15% (11% - 30%) | 6211 | 278 |
| Cote dIvoire | 63% (32% - 100%) | 1238 | 14 |
| Croatia | 30% (20% - 61%) | 2062 | 67 |
| Cuba | 18% (12% - 43%) | 1467 | 58 |
| Cyprus | 40% (23% - 100%) | 843 | 20 |
| Czechia | 33% (25% - 59%) | 7579 | 227 |
| Democratic Republic of the Congo | 12% (7.2% - 30%) | 500 | 31 |
| Denmark | 18% (14% - 32%) | 9008 | 443 |
| Dominican Republic | 16% (13% - 33%) | 6652 | 293 |
| Ecuador | 13% (10% - 26%) | 24675 | 883 |
| Egypt | 8.8% (6.9% - 18%) | 5268 | 380 |
| Estonia | 33% (22% - 69%) | 1666 | 50 |
| Finland | 20% (15% - 38%) | 4906 | 206 |
| France | 5.1% (4.3% - 8.2%) | 128442 | 24087 |
| Germany | 25% (21% - 40%) | 159119 | 6288 |
| Ghana | 62% (33% - 100%) | 1671 | 16 |
| Greece | 19% (14% - 34%) | 2576 | 139 |
| Guatemala | 20% (11% - 67%) | 585 | 16 |
| Guernsey | 19% (9.9% - 56%) | 251 | 13 |
| Honduras | 7.6% (5.3% - 16%) | 771 | 71 |
| Hungary | 6.8% (5.3% - 13%) | 2775 | 312 |
| India | 17% (14% - 35%) | 33050 | 1074 |
| Indonesia | 8.7% (7% - 17%) | 9771 | 784 |
| Iran | 15% (13% - 24%) | 93657 | 5957 |
| Iraq | 19% (13% - 37%) | 2003 | 92 |
| Ireland | 13% (11% - 24%) | 20253 | 1190 |
| Isle of Man | 14% (8.1% - 37%) | 313 | 21 |
| Israel | 68% (51% - 100%) | 15834 | 215 |
| Italy | 7.3% (6.2% - 11%) | 203591 | 27682 |
| Japan | 26% (21% - 51%) | 14088 | 415 |
| Jersey | 12% (7.2% - 32%) | 286 | 21 |
| Kazakhstan | 77% (45% - 100%) | 3205 | 25 |
| Kenya | 19% (10% - 59%) | 384 | 15 |
| Kosovo | 26% (15% - 74%) | 799 | 22 |
| Kuwait | 87% (50% - 100%) | 3740 | 24 |
| Latvia | 54% (28% - 100%) | 849 | 15 |
| Lebanon | 32% (19% - 74%) | 721 | 24 |
| Liberia | 5.5% (3.2% - 18%) | 141 | 16 |
| Lithuania | 31% (20% - 67%) | 1375 | 45 |
| Luxembourg | 45% (31% - 84%) | 3769 | 89 |
| Malaysia | 59% (42% - 100%) | 5945 | 100 |
| Mali | 9.7% (5.9% - 29%) | 482 | 25 |
| Mexico | 5.4% (4.4% - 11%) | 17799 | 1732 |
| Moldova | 23% (17% - 51%) | 3771 | 111 |
| Morocco | 18% (13% - 38%) | 4321 | 168 |
| Netherlands | 7.4% (6.1% - 12%) | 38802 | 4711 |
| New Zealand | 65% (36% - 100%) | 1129 | 19 |
| Niger | 20% (12% - 48%) | 713 | 32 |
| Nigeria | 15% (9.9% - 40%) | 1728 | 51 |
| North Macedonia | 17% (11% - 36%) | 1442 | 73 |
| Norway | 40% (30% - 69%) | 7667 | 202 |
| Pakistan | 27% (21% - 56%) | 15759 | 346 |
| Panama | 27% (20% - 54%) | 6378 | 178 |
| Peru | 18% (14% - 38%) | 33931 | 943 |
| Philippines | 12% (9.4% - 22%) | 8212 | 558 |
| Poland | 15% (12% - 29%) | 12640 | 624 |
| Portugal | 22% (18% - 39%) | 24505 | 973 |
| Puerto Rico | 14% (9.6% - 29%) | 1433 | 86 |
| Romania | 14% (11% - 25%) | 11978 | 675 |
| Russia | 48% (39% - 100%) | 99399 | 972 |
| San Marino | 12% (8% - 27%) | 563 | 41 |
| Saudi Arabia | 68% (50% - 100%) | 21402 | 157 |
| Serbia | 36% (27% - 76%) | 8724 | 173 |
| Singapore | 100% (100% - 100%) | 15641 | 14 |
| Sint Maarten | 5.1% (2.9% - 16%) | 76 | 13 |
| Slovakia | 54% (31% - 100%) | 1391 | 22 |
| Slovenia | 17% (12% - 31%) | 1418 | 89 |
| Somalia | 7.7% (4.8% - 26%) | 582 | 28 |
| South Africa | 35% (25% - 74%) | 5350 | 103 |
| South Korea | 55% (42% - 83%) | 10765 | 247 |
| Spain | 8.5% (7.2% - 14%) | 212917 | 24275 |
| Sudan | 3.9% (2.5% - 13%) | 375 | 28 |
| Sweden | 6.3% (5.2% - 11%) | 20302 | 2462 |
| Switzerland | 22% (18% - 35%) | 29324 | 1407 |
| Thailand | 58% (38% - 100%) | 2954 | 54 |
| Tunisia | 23% (15% - 51%) | 980 | 40 |
| Turkey | 28% (24% - 53%) | 117589 | 3081 |
| Ukraine | 24% (18% - 53%) | 9866 | 250 |
| United Arab Emirates | 74% (52% - 100%) | 11929 | 98 |
| United Kingdom | 4.8% (4% - 8.4%) | 165221 | 26097 |
| United Republic of Tanzania | 12% (6.4% - 42%) | 480 | 16 |
| United States of America | 13% (11% - 23%) | 1039909 | 60966 |
| Uruguay | 40% (21% - 100%) | 630 | 15 |
Table 1: Estimates for the proportion of symptomatic cases reported in different countries using cCFR estimates based on case and death timeseries data from the ECDC. Total cases and deaths in each country is also shown. Confidence intervals calculated using an exact binomial test with 95% significance.
During an outbreak, the naive CFR (nCFR), i.e. the ratio of reported deaths date to reported cases to date, will underestimate the true CFR because the outcome (recovery or death) is not known for all cases [5]. We can therefore estimate the true denominator for the CFR (i.e. the number of cases with known outcomes) by accounting for the delay from confirmation-to-death [1].
We assumed the delay from confirmation-to-death followed the same distribution as estimated hospitalisation-to-death, based on data from the COVID-19 outbreak in Wuhan, China, between the 17th December 2019 and the 22th January 2020, accounting right-censoring in the data as a result of as-yet-unknown disease outcomes (Figure 1, panels A and B in [7]). The distribution used is a Lognormal fit, has a mean delay of 13 days and a standard deviation of 12.7 days [7].
To correct the CFR, we use the case and death incidence data to estimate the proportion of cases with known outcomes [1,6]:
\[ u_{t} = \frac{ \sum_{j = 0}^{t} c_{t-j} f_j}{c_t}, \]
where \(u_t\) represents the underestimation of the proportion of cases with known outcomes [1,5,6] and is used to scale the value of the cumulative number of cases in the denominator in the calculation of the cCFR, \(c_{t}\) is the daily case incidence at time, \(t\) and \(f_t\) is the proportion of cases with delay of \(t\) between confirmation and death.
At this stage, raw estimates of the CFR of COVID-19 correcting for delay to outcome, but not under-reporting, have been calculated. These estimates range between 1% and 1.5% [1–3]. We assume a CFR of 1.4% (95% CrI: 1.2-1.7%), taken from a recent large study [3], as a baseline CFR. We use it to approximate the potential level of under-reporting in each country. Specifically, we perform the calculation \(\frac{1.4\%}{\text{cCFR}}\) of each country to estimate an approximate fraction of cases reported.
We estimate the level of under-reporting on every day for each country that has had more than ten deaths. We then fit a Generalised Additive Model (GAM) of the form \[ \mathbb{E}[\log(D)] = \beta_0 + \beta_1 x_1 + ... + \beta_p x_p,\] specifying a Poisson distribution on deaths (D) as the response variable. The model has a log-link function and a log-offset (\(\kappa\)) consisting of the daily known-outcomes \(u_t\) and the cCFR estimate for that country on that day \(\text{cfr}_t\). The model can then be written as \[ D \sim s(t) + \underbrace{\log(u_t c_t) + \log(\text{cfr}_t)}_{:=log(κ)} \] where \(s(t)\) is a smoothing spline, fitted through the time points (days) for which we have data.
Implicit in assuming that the under-reporting is \(\frac{1.4\%}{\text{cCFR}}\) for a given country is that the deviation away from the assumed 1.4% CFR is entirely down to under-reporting. In reality, burden on healthcare system is a likely contributing factor to higher than 1.4% CFR estimates, along with many other country specific factors.
The following is a list of the other prominent assumptions made in our analysis:
We assume that people get tested upon hospitalisation. A few examples where this is not the case are Germany and South Korea, where people can get tested earlier.
We assume that hospitalisation to death from early Wuhan is representative of the all the other countries (by using the distribution parameterised using early Wuhan data) and that all countries have the same risk and age profile as Wuhan.
Severity of COVID-19 is known to increase with age. Therefore, countries with older populations will naturally see higher death rates. We are extending this analysis to adjust for the age distribution for countries with more than five reported deaths and where age distribution data is available.
All results are linked and biased by the baseline CFR, assumed at 1.4% [3].
The under-reporting estimate is sensitive to the baseline CFR, meaning that small errors in it lead to large errors in the estimate for under-reporting.
There are several sources of uncertainty in this analysis: the reported mean and SD of the delay distribution; the cCFR for each country (on each day) and the baseline CFR. We use the lower and upper mean and SD of the delay distribution to produce the widest interval. The other two uncertainties are just carried through directly in the subsequent calculations, which is crude. An ongoing extension is developing this into a fully Bayesian model, which deals with these different uncertainties more rigorously.
The code is publically available at https://github.com/thimotei/CFR_calculation. The data required for this analysis is a time-series for both cases and deaths, along with the corresponding delay distribution. We scrape this data from ECDC, using the NCoVUtils package [8].
1 Russell TW, Hellewell J, Jarvis CI et al. Estimating the infection and case fatality ratio for covid-19 using age-adjusted data from the outbreak on the diamond princess cruise ship. medRxiv 2020.
2 Verity R, Okell LC, Dorigatti I et al. Estimates of the severity of covid-19 disease. medRxiv 2020.
3 Guan W-j, Ni Z-y, Hu Y et al. Clinical characteristics of coronavirus disease 2019 in china. New England Journal of Medicine 2020.
4 Shim E, Mizumoto K, Choi W et al. Estimating the risk of covid-19 death during the course of the outbreak in korea, february-march, 2020. medRxiv 2020.
5 Kucharski AJ, Edmunds WJ. Case fatality rate for ebola virus disease in west africa. The Lancet 2014;384:1260.
6 Nishiura H, Klinkenberg D, Roberts M et al. Early epidemiological assessment of the virulence of emerging infectious diseases: A case study of an influenza pandemic. PLoS One 2009;4.
7 Linton NM, Kobayashi T, Yang Y et al. Incubation period and other epidemiological characteristics of 2019 novel coronavirus infections with right truncation: A statistical analysis of publicly available case data. Journal of Clinical Medicine 2020;9:538.
8 Abbott S MJ Hellewell J. NCoVUtils: Utility functions for the 2019-ncov outbreak. doi:105281/zenodo3635417 2020.